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ABSTRACT 

We address the problem of compressed sensing with mul¬ 
tiple measurement vectors associated with prior information 
in order to better reconstruct an original sparse signal. This 
problem is modeled via convex optimization with £ 2,1 — ^ 2,1 
minimization. We establish bounds on the number of mea¬ 
surements required for successful recovery. Our bounds and 
geometrical interpretations reveal that if the prior information 
can decrease the statistical dimension and make it lower than 
that under the case without prior information, £ 2,1 — £ 2,1 min¬ 
imization improves the recovery performance dramatically. 
All our hndings are further verihed via simulations. 

Index Terms — Convex optimization. Multiple measure¬ 
ment vectors. Sparsity, Statistical dimension 

1. INTRODUCTION 

1.1. Background and Problem Definition 

Compressive sensing (CS) [1, 2, 3] of sparse signals in achiev¬ 
ing simultaneous data acquisition and compression has been 
extensively studied in the past few years. In this paper, we fo¬ 
cus on multiple measurement vectors (MMVs) that are sens¬ 
ing results with respect to observed signals. MMVs gradu¬ 
ally exhibit the applicability especially in the areas of wireless 
sensor networks and wearable sensors [4, 5, 6]. 

Let S = [si, S2, ..., Si] G be the matrix of I (> 

1) original signals to be sensed by a sensing matrix $ G 
< n) and let the matrix of measurement vectors be 
Y = [yi,y 2 ,...,yi] G where = $Sj, i = 1,2,...,Z. 

Suppose there exists a orthonormal basis 'k such that Si = 
'I>Xi and Xq = [xi,X 2 , G is fc-joint sparse. 

In other words, all x^s share the common support. Given 
A = $4', recovery from MMVs can be efficiently solved via 
convex optimization as: 

(Mconvex) min/(A) s.t. Y = AX, 

where /(•) denotes a convex function. We call the problem 
(Mconvex) succeeds if it has a unique optimal solution and is 
ground truth Xq. In this paper, the convex function is chosen 
as /(A) = ||A|| 2 ,i to enhance the joint-sparsity of A: 

(MLl) imn||A|| 2 ,i s.t. Y = AX. 


So far, there is very limited literature about MMVs with 
prior information via convex optimization. In fact, we can 
have some prior knowledge about the ground truth Aq in, for 
example, the problem of distributed compressive video sens¬ 
ing (DCVS) [7]. In DCVS, we usually adopt higher/lower 
measurement rates to sample and transmit key/non-key 
frames at encoder, and then we treat these reconstructed 
key frames as the prior information for better recovery of 
the non-key frames at decoder. Mota et al. [8] hrst propose 
the analysis of single measurement vector (SMV) with prior 
information via convex optimization. They show that the 
performance can be improved provided good prior informa¬ 
tion can be available. In [9], we characterize when problem 
(MLl) succeeds and derive the phase transition of success 
rate inspired by the framework of conic geometry [10]. 

In this paper, we further extend the problem (MLl) to 
(MLl) plus prior information as: 

(MLIP) rmn||A|| 2 .i + A||A-VL|| 2 ,i s.t. Y = AX, 

where W is prior information associated with ground truth 
Aq. The goal here is to provide theoretical but practical bound 
of the probability of successful recovery and analyze the rela¬ 
tionship between prior information and performance. 

1.2. Contributions of This Paper 

We summarize the contributions of our works here. 

• Based on conic geometry, the phase transition of suc¬ 
cess rate in (MLIP) is derived and is consistent with 
the empirical results. This study indeed provides the 
useful insights into how to solve the problem of MMVs 
with prior information. 

• What prior information is “good” can be concluded by 
our theoretical analysis. For example, instead of giving 
the rough conclusion such as || Aq — IT|| 2,1 being close 
to 0, we clearly show how the supports of A — IT and 
the signs of A — IT affect the performance. 

1.3. Notations 

For a matrix H, we denote its transpose by H^; its z* row 
by L*; its j* column by hj\ and the z* entry of column by 


/i*. Ah ■= {* : ||/i *||2 ^ 0} for a matrix H is a support set 
that collects the indices of nonzero rows of H. 11-11^ and || • || ^ 
denote the fp-norm and Frobenius norm, respectively. The 
fp^q-norm of a matrix is defined as ||X||p^g = ||(||a;*||p)nxi||g- 
The null space of matrix A G is defined as null(Gl, 1) = 

{Z G : AZ = Omxi}- Let E denote the expected value 
and let B = {x : ||x ||2 < 1, x G R.”} denote closed unit ball. 
The dot product of two matrices is {X, F) = tr . 

2. CONIC GEOMETRY 

We briefly introduce how a convex function can be specified 
in terms of conic geometry to make this paper self-contained. 
First, we introduce a cone and measure its size in a sense of 
statistical dimension. Then, they are connected with optimal¬ 
ity condition for the MMVs recovery problem. 

Definition 2.1. (Descent cone [10]) 

The descent cone D(/, x) of a function / : R" —>■ R at a point 
X G R", defined as: 

x) := y {m G R” : f{x + tu) < /(x)}, 

T>0 

is the conical hull of the perturbations that do not increase / 
near x. 

By the definition of descent cone, the necessary and suffi¬ 
cient condition of the success of problem (MLl) is described 
and proved in our earlier work [9]. But in this paper, the main 
problem we are studying is not related to a norm function, so 
we need to modify the proof slightly to fit the problem (Mcon- 
vex) with general convex function as follows. 

Lemma 2.2. (Optimality condition for MMVs recovery with 
general convex function) 

The matrix Xq is the unique optimal solution to problem 
(Mconvex) if and only (/'D(/, Xq) fl null(A, 1) = {Onxi}- 

Proof Assume Xq is the unique optimal solution to prob¬ 
lem (Mconvex). Given a matrix Z G T){ f, Xq) fl null(A, /), 
we know that Xq -I- F is a feasible point of problem (Mcon¬ 
vex) and /(Xo + Z) < /(Xq), which implies that Xq + Z 
is an optimal solution to problem (Mconvex). According to 
the uniqueness of optimal solution of problem (Mconvex), we 
have E = 0, and thus D(/, Xq) fl null(A, 1) = {0„xi}- 
Conversely, suppose D(/, Xq) fl null(A, () = {0„xi}- 
Since we know that Xq is a feasible solution of problem 
(Mconvex), for any matrix Z G null(A, l)\ {0„xn}, + Z 

is also feasible. If /(Xq + Z) < /(Xq), then we have 
Z G D(/, Xq) n null(A, ()\{0„xi} = 0, which is impossi¬ 
ble. Therefore 

/(Xo + Z)> /(Xo) for all F G null(A, l)\ {0„xi} , 

which means that Xq is the unique optimal solution to prob¬ 
lem (Mconvex). □ 


Since linear subspace is also a cone. Lemma 2.2 connects 
the optimal conditions to the relation that the intersection be¬ 
tween the descent cone at Xq and matrix null space is single- 
ton (i.e., problem (Mconvex) succeeds). 

For a random sensing matrix A, the probability of success 
for problem (Mconvex) can be related to the “sizes” of two 
cones in Lemma 2.2. Unfortunately, since a cone may be not 
linear, there’s no a standard definition to describe the size of 
a cone. Amelunxen et al. [10] give a way to measure the size 
of a cone, as described in the following. 

Definition 2.3. (Statistical Dimension [10]) 

The statistical dimension (S.D.) S{C) of a closed convex cone 
C C R" is defined as: 

5(C):=E[||n(5,C) 1 , 

where g G is a standard normal vector and 
denoting the Euclidean projection onto C, is defined as: 
]^(a;,C) := argmin{||a; - y \\2 '.y GC}. 

According to the definition of S.D. of a cone, Amelunxen 
etal. [10] derive the probability that two cones with a random 
rotation are separated as follows. 

Theorem 2.4. (Approximate kinematic formula [10]) 

Fix a tolerance t] G (0,1). Suppose that Ci, C 2 C R^ are 
closed convex cones, but one of them is not a subspace. Draw 
an orthogonal matrix Q G R"^” uniformly at random. Then 

5{Ci) + 5{C2) < n — ar)s/n P{Cin(5C2 = {0}} > f — rj, 

5{Ci) -\- 5{C2) >n + ans/n P{Ci fl QC 2 = {0}} < p. 

The quantity a,, := ^>^J\og{4:/rj). 

In order to satisfy the requirement of Theorem 2.4, both 
$ and T' can be easily selected such that A = is a Gaus¬ 
sian random matrix [11]. In compressive sensing, <I) and T' 
are conventionally used to set as a Gaussian random matrix 
and orthonormal basis, respectively, so that A = is also 
a Gaussian random matrix [11]. Let Ci — V (/, Xq) and let 
QC 2 — null(A, f) with a random matrix A = [11]. The 

probability of intersection given in Theorem 2.4 can be re¬ 
formulated as the probability of existence of unique optimal 
solution by Lemma 2.2, i.e., 

P(Ci n QC 2 = {0}) = F(V(f, Xo) n null(A, 1) = {0„x/}) 
= P((Mconvex) succeeds). 

Since the nullity of A is n — m almost surely, the dimension 
of C 2 is S (null(A, 1)) = dim (null(A, /)) = (n — m)l. Then, 
the probability that (Mconvex) succeeds can be estimated by 
Theorem 2.5, which was derived in our earlier work [9]. 



Theorem 2.5. (Phase transitions in MMVs recovery) 

Fix a tolerance r] G (0,1). Let Xq € Le a fixed matrix. 
Suppose A G has independent standard normal entries 

and Y = AXg. Then 

^ ^ sCDif.Xo)) ar,y^i ^ p succeeds) > 1 — t]; 

^ < si'DipXo)) _ ar,y^i ^ p succeeds) < p, 


where the quantity Qr/ '■= 



For the upper bound of S{'D{C,w, X)), since 

E inf Fg{t) < inf E [Fg(t) 1 = inf F (r), 

T>0 J T>0 T>0 

the result follows. 

Next we aim to estimate the lower bound of 5{'D{(^w tX)). 
By the fact that Fq (t) is convex on r > 0 and continuous 
differentiable on t > 0 (Lemma C.l in [10]), we have 

Fg (t) > Fg (to) + Fg (to) (r - To), (2) 


3. ESTIMATION OF S.D. IN (MLIP) 

In Theorem 2.5, 6{'D{f, Aq)) plays an important role to esti¬ 
mate the probability that (Mconvex) succeeds. However, cal¬ 
culating the exact value of S.D. of a cone is still open. In 
this section, we provide the bounds of S.D. of descent cone 
at the point Aq associated with convex function (^vv(A) = 
11^112 1 11"^ ~ W^ll 2 1 in problem (MLIP), where function 

C,w is called f 2 ,i-norm with prior information. 

Theorem 3.1. (Error bound of S.D. in (MLIP)) 

Let dC,w be subdijferential of (Av- Suppose 9Cw(A) is 
nonempty and compact, and does not contain the origin. 
Then, we have 


for any r and tq . 

Let T* and Tq be the minimizer of F{t) and Fg{t), re¬ 
spectively. Since F{t) is strictly convex on r > 0 and differ¬ 
entiable on T > 0 (Lemma C.2 in [10]) the minimizer t* of 
F{t) is unique, that is, 

r* = argminF'(r). 

T>0 

Then, Eq. (2) can be written as 

Fg{t^) > FGiT*) + Ff{T*)ir^ - T*). 

(Fq{t*) is the right derivative provided r* = 0). Then the 
expected value of infT->o Fg{t) in Eq. (1) corresponding to 
G becomes 


inf F(r)-e(A) <d{ViCw,X)) < inf ^(r),' 

T>0 r>0 

her £(X) = 2 |l^llF-""p{ll‘S'llF-gegCw(x)} 

) {dCw{X),X) 

F{t) := F{t, a) = E \disi^ {G, r • d(^w (A))] for r > 0, 

and G G is a Gaussian random matrix. 

Moreover, for k-joint sparse matrix Aq G we have 

inf F(t) - <6{V{C,w,X^)) < inf F(t). 

F>o (1 - \)^Jk F>o 

Proof. Eor any given matrix A, we have 

5 {V (Cw, ^)) = E [dist^ (G, V iCw, A)°)] , 

where the distance function is dist (G,C°) = |in(^’^)llF 
for a fixed cone G. According to Corollary 23.7.1 in [12], 
the polar cone can be rewrite as 'D{(^w,X)° = 1Jt>o^ ’ 
dCw (A), thus 

E[dist2(G,D(Cw,A)°)] =E infFG(T) , (1) 

T>0 

where 

Fu{t) := Fu(t,X) = dist^ {U,TdC,w{X )), forr > 0. 

^The upper bound of S.D. (right inequality) follows Proposition 4.1 of 
[10]. 


E infFG(r) 

r>0 

=nFG{Th)] 

>E[F’G(r*)]+E[F^(T*)(T5-T*)] 

= F{t*) + E [(rS - r*) • (F^(r*) - E [F^(t*)])] 

-fE -T*] •E[F^(t*)] 

= F{t*) + E[(rG - E[rS]) • (F^(r*) - E[F’^(r*)])] 

+E[t^ - T*] ■ F'{t*) 

> inf ,>0 A(t) - (Var[Ta] • V?w[F'a{T*)]f/^ 

-fE[T5 - T*] • F'{t*). 

We can see that E [tg — T* ] • JP' (r*) >0 since F'{t*) = 0 
if r* >0 and F'{t*) > 0 if r* =0 (because r* minimize 
F(t)). Therefore, 

SiVi(:w,X)) > inf F(t) - (Var[rS] • Var[i^^(r*)])i/2. (3) 

r>0 

Next, to compute the variance of Tq, we need to devise a con¬ 
sistent method for selecting a minimizer tjj of Fjj. Introduce 
the closed convex cone C := cone(c)(^vv(A)), and notice that 

inf Fu{t) = inf dist^({7, r • c)Cw(A)) = dist^({7,C). 

r>0 T>0 

In other words, the minimum distance to one of the sets 
r(9Cw(A) is attained at the point nc(^) argmin{||{7 — 
G||f : G G C}. As such, it is natural to pick a minimizer tjj 
of Fjj according to the rule 

T,j ;= i„f{r > 0 : ^(f/) e rSC^tV)} = X) ' 

( 4 ) 



In light of Eq. (4), we have 


\^u -rv\ = (nc(C/) - nc(^)>^) I 


IIXII 


- {dCw{X),X) 


nc(t')-nc('oii^ 


We have used the fact (B.3) in [10] that the projection onto a 
closed convex set is nonexpansive. By the relation between 
Var(TQ) and Lipschitz constant x) ’ 


(Var(T£))i/2 < 


{d(:w{X),X)- 


(5) 


By the lemma (C.l) in [10], 

(Var(F^(T*)))i/2 < 2 sup \\S\\f. (6) 

SedCwix) 


Substitute Xq into Eqs. (3), (5), and (6). In Eq. (5), 
(9Cw(-^o),can be reformulated by cosine function to 
ll-’^olb,! + AX]r=i Ikolb • cos{ZOxQWi). It is obvious that 
the lower bound of (9Cvi/(-^o),is 1 — A||Xo|| 2 ,i- In Eq. 
(6), the right hand side sup{||S'||f : S G will 

be equal to (1 + \)^/n because the rows of 9||Xo||2,i and 
9||Xo — 1E||2 ,i are already normalized. We have 


5||^||2,i + A9||X — 1 E|| 2 ,i, we calculate the subgradient 
of Qw{X) according to the indices sets of zero and nonzero 
rows with respect to X and X — W. We separate the domain 
of C,w{X) into four cases, where Ei = Ax H Ax-w, E 2 = 
Ex n i ?3 = A'jj. n Ax-w^ and i ?4 = A^^ fl A\_.^^. 

Then, we have the following lemma. 


Lemma 3.3. (Subdifferential of f 2 ,i-norm with prior infor¬ 
mation) 

For any X,U G K."^^ we have 


U G dC,w (X) 4^ u" € 9(||x*|| 2 + A||x* - w*|| 2 ), 1 < i < n, 
where 

M*Ga(||a;*||2+A||x*-tt;*||2)<^ 


«* = «' + A(^rf^),||ai|2<l, 
y = a^ + A/3% ||a%|2 < 1, W% < 1, ifi G E;4. 


According to Lemma 3.3, Theorem 3.1 can be rewritten 
as follows. 


SiViCw^Xo)) 

> inf F(t) — 

T >0 

> inf F(t) — 

T >0 

> inf F(t) — 

T >0 


2||Xo||fSup{||^||F:^G9Cty(^o)} 

(5Ctv(Aio),-Ao) 

2(l + A)v/^||Xo||f 

(1 - A)||Xo||2,i 

2(1 + X)-\/n 

7i^^a)7^’ 


where the last inequality depends on We 

complete the proof. □ 

To calculate the function F{t) in Theorem 3.1, we first com¬ 
pute the subdifferential of both £ 2 , 1 -norm and (w{X). 

Lemma 3.2. (Subdifferential of f 2 ,i-norm [13]) 

For any X,U € we have 


U G d\\X\\ 2 s ^ u* G d\\x% 2 , 1 < i < n, 


where 


u*Ga||x%|2^ 


U^=X^/\\X%2 

ll«*l|2 < 1 


ifx^ ^ 0, 
ifx^ = 0. 


The subgradient of £ 2 .i-norm at X is calculated by row- 
by-row subgradient of Euclidean norm 11 • 11 2 , whereas 3 11 x* 11 ^ 
consists of the gradient whenever x* ^ 0, and 9 11 x* 11 ^ S if 
X* = 0. That is, the computation of subgradient of f 2 ,i-norm 
at X depends on if a row of X is zero or not. 

Moreover, since the subdifferential of C,w{X) can be 
calculated separately as 9(||X||2,i + A||X — lE|| 2 ,i) = 


Theorem 3.4. (Statistical dimension of descent cone of £ 2 , 1 - 
norm with prior information) 

With the same notations and assumptions as in Theorem 3.1, 
the S.D. of the descent cone of (^w ot the point Xq satisfies 
the inequality 

i’p - ^ Smw,Xo)) < fp. ( 7 ) 

The function tpp is defined as ipp{E) := infT->o {i?p(T, £')}, 
whereE — {\Fi \, |i32|, \F^\, |i34|) andRp = Ti-\-T 2 -\-T^-\- 
T 4 with 

Ti = \Fi \ {I + T^}?) + 2r^A ^ cos(ZOxqW^), 

ieEi 

T 2 = \F2\{t - rXf ■ -^-^f/2-iirt)dt, 

roo 

T, = \F,\J^ {t-rr.—^-^-^f/ 2 -iirXt)df 

Ti = \E4\ / {t- t( 1 -f I'^dt, 

i (V2j Jt(i+\) 

where E is gamma function and 

= L r(t + i)r(„ + t + i) (2j 

Bessel functions of the first kind. 



Proof. First we separate Fc{t) as follow: 
dist 2 (G,T- aCvK(Xo)) 


= E 

ieEi 


9 -T- 


II 4 II 2 


+ A 


+ E “ 


iGE 2 


/3*GB 


9 -T' 


\Xo - W ^\\2 
|2 


II 4 II 2 


A/3’ 


+ E 




ol^^B 


g'' — T ■ a’ + A 


Xq - W ‘||2 

2 




iSEi 




(8) 

(9) 

( 10 ) 

( 11 ) 


InEq. ( 8 ), for each z G i 3 i,let 7 ® = Ti-iHi—hA 


\||a:’-juq |2 )' 


11^6112 II 2 , 

By taking the expected value of Eq. ( 8 ), together with the fact 
that p* ~ N{0, 1), we have 


E 


E 


9 -T 7 


LiG-Ei 

= E 


iG-Ei j=l 


= E EE ((»;)’-2^9/+^’7)3 

ieEi j=l 

= E E (E 

ieEi j=l 

= EE(i+^’h ;)3 

ieEi j=l 

= \Ei \ {I + A^) + 2t^A ^ cos(ZOxqW^) 

ieEi 

= Ti. 

In Eq. (9), for each z G 3 ^ 2 , let ^ = p* — r ,, , the 


minimization problem can be written as 

inf_|| 7 ’ - tA/3’||2 . 
/3*GB 


( 12 ) 


We can see that the optimal value is 0 provided 
In the case || 7*||2 > e\, the optimal solution is /3’ = 

with optimal value || 7’||2 — t\. That is, the optimal value of 
Eq. (12) is 


0 


inf_||Y-rA/3’|L = 

/3*GB \ 

and hence Eq. (9) becomes 


T - rAi 


E imj|i = E (h*ll2-^^)+- (13) 

i^E 2 jGBs 


Similarly to Eq. (10) and Eq. (11). In Eq. (10), for each 


z G £^ 3 , let 7 * = 5 ’ —rA- 


-, the minimization problem 


can be written as 


in^ 7 ’ — ra* 


a*GB 

which the optimal value is 


2 > 


0 


in^ 7 * — ra* = 


a“GB 


7 — T 


II 7 ’ 


if ||7l|2<r 

if h% >r, 


and hence Eq. (10) becomes 

E inf IIt*-™'IIE E 


i^Es 


a^^B 


(14) 


i^Es 


In Eq. (11), for each z G £ 4 , the optimal value of the 
minimization problem is 


inf _|U’-T - (a’ + A/3’) 

a7/3*G8 


0 


5’-t(1 + A) 


and hence Eq. (11) becomes 


if ||5’||2 < ^(1 + A) 

if II 5 II 2 > ^(1 + A), 


E Inf _ II 5 * - T • (a’ + A/3’) 


i^E 4 




= E(ikif--(i+^));- 

i^Ei 


(15) 


Next, we discuss the expected value of Eq. (13)~(15). 
Eor Eq. (13), let 82,1 = IITII 2 ’ f°'‘ ^ ^" 2 - Since ~ 

3V(0,1), S' 2 ,i follows the noncentral chi distribution with the 
same degrees of freedom I and the same mean r for all z G £ 2 , 
which implies that all 82,1 have the same probability density 
function 


p(S'2.* = S]1,t) = 


g/^.g-(.= +r ^)/2 


f;/ 2 -i {ts) . 


if hl 2 <^A 
if IItL>^a, 


By taking the expected value, we have 
^ E [(52.i - tX)1 

i^E 2 

/ CO 

{t - rXf ■ p{t] I, T)dt 

pOO 

= |£ 2 | / {t-rxy ■ p{t;l,T)dt 

J tX 

= T 2 . 

Similarly, = || 7’||2 follow the noncentral chi distri¬ 
bution with the same degrees of freedom I, the same mean tA, 
and the same probability density function p (£3 j = s; I, tX) 
for all i G E 3 . 



Then, by taking the expected value, Eq. (14) becomes 4.1. Parameter Setting 






2 

+ 





=iE3i/; 


(t — tY ■ p (t; I, tX) dt 
{t — tY ■ p (t; I, tX) dt 


= T3. 


For Eq. (11), S' 4 ,i = || 5*||2 follow the chi distribution 
with the same degrees of freedom I, and the same probability 
density function 


For parameter setting, the signal dimension was fixed at n = 
100 and sparsity was set to k = 16. The number of mea¬ 
surement vectors was 1. Since there are no changes with 
performance when the length of a measurement vector m is 
larger than ^ in all simulations, m was set to range from 1 
to ^ to focus on the phase transition of performance. In our 
simulations, we construct a signal matrix Xq € with 

k nonzero rows and generate prior information W with kw 
nonzero rows to satisfy w® = x*, Vt G Aw C Ax. 

4.2. Prior Information Controlled by |£^ 2 | 


pi^S^^i — 5 , /) — 


2^ 2 Ig 2 


rG) 


for all i G E^. Then, Eq. (15) can be reformulated as: 

^E[(||5l|2-r(l + A))^' 

i^E^ 

_ ^oo 

= E / {t - t{1 + X)Y ■ p{t-,l)dt 

i&Ei + 

= |A;4| / {t - t{1 + X)Y ■ pit;l)dt 


= Ti. 


Therefore, 


E[dist2(G,r • 9Cw(Xo))] = i?p(T,E), 

and we complete the proof. □ 

Following Theorem 3.4, since Rp is strictly convex, the 
infimum value can be computed by finding the root of deriva¬ 
tive of Rp. Moreover, if we divide the inequality in Eq. (7) 
by n, we can see that the error term , ^( 3 +a) inversely 

{l — \)\/nk 

proportional to n. That is, the error term is negligible as n is 
large enough. We verify Theorem 3.4 in the next section. 


4. VERIFICATION 

In this section, we verify our theoretical analysis about phase 
transition in compressive sensing via £ 2 , 1 -^ 2 ,i minimization, 
which were conducted using the CVX package [ 14]. Based on 
Theorem 3.4, it’s clear to see that S.D. is highly related to ipp, 
which is dominated by E and cos(ZOa;*w*) named 

cosine term. Hence, our simulations are divided into three 
categories: (1) Examine how prior information, controlled by 
1 7 ^ 2 1, improve the performance, (2) Verify how prior informa¬ 
tion with correct supports but imprecise values, controlled by 
|£^i| and cosine term, affect the performance, and (3) Exam¬ 
ine how prior information with wrong supports, controlled by 
|i? 3 |, affect the performance. All the parameters in the three 
simulations follow the setting described in the next subsec¬ 
tion. 


In the first simulation, kw is 4 or 8 and Hs 2 or 5. The 
following procedure (Step 1 ~ 3) was repeated 100 times for 
each set of parameters, composed of I and kw- 

Step 1 Draw a standard normal matrix A G and gen¬ 

erate Y = AXq- 

Step 2 Solve problem (MLIP) by CVX to obtain an optimal 
solution X*. 

Step 3 Declare success if ||X* — Xo||f < lO”®. 

As described in Theorem 3.4, 5{'D{C,w,Xo)) depends on 
n, E, /, and A. By the definition in Theorem 3.4, |i7i| = 12 
and |i72| = 4 in Fig. 1(a) and (c); |i7i| = |i72| = 8 in 
Fig. 1(b) and (d). No matter I equal to 2 or 5. In Fig. 1, the 
theoretical curve (in black), indicating Wo)) (jgrived 

in Theorem 2.5, is located at the vague region (of separating 
success and failure) of practical recovery results (in blue). We 
can observe that the theoretical results (in black) and the prac¬ 
tical results (in blue) in Fig. 1(b) are more close to the origin 
than those in Fig. 1(a) because the |i72| in (b) is greater than 
the |i? 2 | in (a), in other words, more correct supports (i.e., 
larger kw) are available. Also we can observe that the prac¬ 
tical result (in blue) in Fig. 1(b) is better than Fig. 1(a) as the 
former is more close to the origin. Similar results can also be 
observed in Figs. 1(c) and (d) when I becomes larger. In addi¬ 
tion, they show that both the theoretical and practical results 
will be more close to the origin than those in Figs. 1(a) and 
(b) due to a larger I is used. Such phenomena are reasonable 
because more prior information will be helpful in recovery of 
sparse signals. 

4.3. Prior Information with Correct Supports but Impre¬ 
cise Values 

We discuss how much influence of cosine term on S.D. and 
performance. This is equivalent to exploring the similarity 
between Xq and W. The parameters were I = 5 and kw = 8. 
We construct a matrix Xq G R^ooxs — iq nonzero 

rows and generate prior information W with kw = 8 nonzero 
rows, where Kw C Ax is chosen. We repeat the procedure 
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Fig. 1. The empirical probability that problem (MLIP) recov¬ 
ers a sparse signal matrix with the help of prior information 
W: (a) kyy — 4 and I — 2; (b) kw — 8 and I — 4; (c) 
kw = 4 and I — 5; (d) kyy — 8 and I — 5, given random 
linear measurements Y = AXq. 


Fig. 2. The empirical probability that problem (MLIP) iden¬ 
tifies a sparse matrix with I measurement vectors under prior 
information W: (a) Type 1; (b) Type 2; (c) Type 3; (d) Type 
4. 


(Step 1^3) 100 times for four types of prior information, 
described as follows. 

Typel. w" N{ 0 , 15 x 5 ), G Ayy. 

Type 2. w® = sign(a:®), Vi S Ay/. 

Type 3. tu® = (/i -f 3cr) • sign(a:®), Vi £ Aw, where fi and 
tr are mean and standard deviation of x®, respectively. 

Type 4. w® = X®, Vi G Aw C Ax. 

Fig. 2 shows the results for four types of prior information 
under n = 100 and kw = 8. The results are shown in Fig. 2 
and are summarized as follows: (1) As shown in Fig. 2 (a), 
Type 1 makes the cosine term cos{ZOx^w'') unpredictable but 
is expected to be the highest one among the four types and 
cause the worst performance. (2) In Fig. 2 (b), W only has 
correct signs, so it cannot ensure if cos(ZOx^w^) is greater 
than or less than 0. However, correct direction still improves 
the performance. (3) In Fig. 2 (c), W has correct signs with 
the original signal and satisfies |x® | < |tt;® | for i £ Aw and 
1 < J < ^ with probability as high as 99%. These make the 
cosine term less than 0 and lead to better performance. (4) 
Since Type 4 carries the best prior information. Fig. 2 (d) 
exhibits the upper bound of performance. 

4.4. Prior Information with Wrong Supports 

For the last simulation, we verify whether the effect of prior 
information with wrong supports is correctly predicted by 
Theorem 3.4. The parameters were set as Z = 5 and kw = 8. 


Prior information with Type 3 was considered here. Next, 
we choose some i £ such that u;® ~ N{0,lixi) ran¬ 
domly. The procedure (Step 1^3) was repeated 100 times 
for each pair of parameters, m and kw, under four cases of 
different numbers of wrong supports as the prior information. 
As shown in Fig. 3, they are |i? 3 | = 6 , \Ei\ = 78 in (a), 
|L; 3 | = 12 , |£^ 4 | = 72 in (b), |£; 3 | = 18, |L; 4 | = 66 in (c) 
and|L; 3 | = 24, |L; 4 | = 60 in (d). 

To compare with the case without prior information, the 
results regarding i 5 (I 2 (|| • || 2 ,i, 2 fo)) are labeled in red line in 
Fig. 3. In Fig. 3 (a), although |i? 3 | = 6 , but it still have 8 
correct supports information, overall, S.D. with such W still 
much lower than red line. In Fig. 3 (b), |ii^ 3 | increase to 12, 
S.D. with such W become almost nothing different then red 
line. In Fig. 3 (c) and (d), along with the increase of |i? 3 |, 
the performance degrades and blue line is even greater than 
red line, in other words, f 2 ,i-norm minimization without prior 
information will gives better performance. 


5. CONCLUSION 

In view of the fact that the phase transition analysis in joint- 
sparse signal recovery with prior information of compressive 
sensing is relatively unexplored, we have presented a new 
phase transition analysis based on conic geometry to figure 
out the effect of prior information for MMVs in this paper. 
Our studies indeed provide useful insights into the critical 
problem of selecting prior information to guarantee improve¬ 
ment of signal recovery in the context of compressive sensing. 




























(a) k-w = 14, ^ = 5 
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(c) kw = 26,l = 5 



(b) kw = 20,/ = 5 



(d) kw = 32,/ = 5 



Fig. 3. The empirical probability that problem (MLIP) iden¬ 
tifies a sparse matrix with prior information W and L mea¬ 
surement vectors (a) kw — 14 with 6 wrong supports, (b) 
kw — 20 with 12 wrong supports, (c) kw — 26 with 18 
wrong supports, (d) kw = 32 with 24 wrong supports, given 
random linear measurements Y = AXq. 
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